{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "茅台网上销售渠道（http://www.china-moutai.com/xinwen/2014/658.html）：\n",
    "\n",
    "    1、茅台商城　https://www.emaotai.cn;\n",
    "    \n",
    "　　2、天猫茅台官方旗舰店　http://maotai.tmall.com;\n",
    "  \n",
    "　　3、国酒茅台阿里巴巴旗舰店　http://emaotai.1688.com;\n",
    "  \n",
    "　　4、工行融e购:　茅台商城官方旗舰店 ;\n",
    "  \n",
    "　　5、建行善融商城:　茅台商城官方旗舰店 ;\n",
    "  \n",
    "　　6、国美在线:　茅台商城官方旗舰店 ;\n",
    "  \n",
    "　　7、苏宁易购:　茅台商城官方旗舰店 ;\n",
    "  \n",
    "　　8、京东商城:　茅台商城官方旗舰店 ;\n",
    "  \n",
    "　　另外，我公司授权\"京东商城\"在其官网 www.jd.com 销售贵州茅台酒股份有限公司产品。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "1.茅台商城爬虫https://www.emaotai.cn\n",
    "\n",
    "包含商品名、品牌、价格、浏览数、已销售数、库存信息等"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "import requests\n",
    "import time\n",
    "from bs4 import BeautifulSoup\n",
    "import csv, codecs\n",
    "from tqdm import tqdm"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "base_url = 'https://www.emaotai.cn/'\n",
    "base_urls = ['https://www.emaotai.cn/browse/category-60.htm?flavortype=&pageindex=' + str(i) for i in range(1, 5)]\n",
    "data_date = time.strftime('%Y-%m-%d %H:%M:%S',time.localtime(time.time()))\n",
    "\n",
    "headers = {\n",
    "    'User-Agent': 'Mozilla/5.0'\n",
    "}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def get_page(url):\n",
    "    r = requests.get(url, headers=headers)\n",
    "    soup = BeautifulSoup(r.text, 'lxml')\n",
    "    content = soup.find('div', {'class':'category_pro_list'})\n",
    "    urls = set()\n",
    "    for i in content.find_all('a'):\n",
    "        if 'href' in i.attrs:\n",
    "            urls.add(base_url + i.attrs['href'])\n",
    "    return urls"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def get_product_info(url):\n",
    "    r = requests.get(url, headers=headers)\n",
    "    soup = BeautifulSoup(r.text, 'lxml')\n",
    "    content = soup.find('div', class_=\"product_parameter\")\n",
    "\n",
    "    title = content.find('span').get_text()  #商品名\n",
    "    item = content.find('tr', class_=\"product_para2\")\n",
    "    item_2 = item.find_all('span')\n",
    "    list_1 = []\n",
    "    for i in item_2:\n",
    "        list_1.append(i.get_text())\n",
    "    product_id = list_1[0]  #商品编号\n",
    "    brand = list_1[1]  #品牌\n",
    "    price = list_1[2]  #价格\n",
    "    view_count = list_1[6][:-2]  #浏览次数\n",
    "    sell_count = list_1[9]  #已售出量\n",
    "\n",
    "    item_3 = content.find('div', class_=\"product_para_num\")\n",
    "    s_1 = item_3.get_text().strip()\n",
    "    if '真品保证' in s_1:\n",
    "        store_count = '仅供展示'\n",
    "    else:\n",
    "        store_count = s_1.split()[-1][4:][:-1]  #库存数\n",
    "    \n",
    "    product_info = (data_date, title, product_id, brand, price, view_count, sell_count, store_count)\n",
    "    return product_info"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "collapsed": false,
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "100%|████████████████████████████████████████████| 4/4 [00:07<00:00,  2.13s/it]\n"
     ]
    }
   ],
   "source": [
    "product_urls = set()\n",
    "for i in tqdm(base_urls):\n",
    "    urls = get_page(i)\n",
    "    product_urls = product_urls | urls"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "100%|██████████████████████████████████████████| 81/81 [01:15<00:00,  1.26it/s]\n"
     ]
    }
   ],
   "source": [
    "data = []\n",
    "for url in tqdm(product_urls):\n",
    "    prooduct_info = get_product_info(url)\n",
    "    data.append(prooduct_info)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "100%|███████████████████████████████████████| 81/81 [00:00<00:00, 40478.81it/s]\n"
     ]
    }
   ],
   "source": [
    "with codecs.open(r'data/emaotai.csv', 'a', encoding='utf_8_sig') as f:\n",
    "    csv_file = csv.writer(f, dialect='excel')\n",
    "    for i in tqdm(data):\n",
    "        csv_file.writerow(i)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "hide_input": false,
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.0"
  },
  "toc": {
   "colors": {
    "hover_highlight": "#DAA520",
    "navigate_num": "#000000",
    "navigate_text": "#333333",
    "running_highlight": "#FF0000",
    "selected_highlight": "#FFD700",
    "sidebar_border": "#EEEEEE",
    "wrapper_background": "#FFFFFF"
   },
   "moveMenuLeft": true,
   "nav_menu": {
    "height": "12px",
    "width": "252px"
   },
   "navigate_menu": true,
   "number_sections": true,
   "sideBar": true,
   "threshold": 4,
   "toc_cell": false,
   "toc_section_display": "block",
   "toc_window_display": false,
   "widenNotebook": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
